library(tidyverse)
library(magrittr)
library(cowplot)

#Data Load In:
urine_raw <- read_csv('~/Documents/Projects/Short_Term_Broccoli/Data/SFN_Targeted/Urine/Urine_R_Format.csv')
metadata <- read_csv('~/Documents/Projects/Short_Term_Broccoli/Data/Metadata/Meta_urine.csv')
#Clean the metadata
mdata_clean <- metadata %>%
  #Weight (grams) used as a  proxy for mL, convert to L
  mutate(across(c(4:9), ~.x/1000)) %>%
  #Pivot the data longer
  pivot_longer(cols = starts_with('vol'), names_to = 'time', values_to = 'volume') %>%
  #Rename the variables so they can treated as factors
  mutate(time = gsub('vol_', '', time)) %>%
  mutate(time = gsub('h','', time)) 

#Clean the urine data
urine_clean <- urine_raw %>%
  #Rename the time points so they match the metadata
  mutate(time = gsub('h','', time)) %>%
  #Join the data with the metadata
  left_join(., mdata_clean) %>%
  group_by(subject_id, time) %>%
  #Multiply all the uM by the (proxy) volumes to convert to uMol recovered
  mutate(across(starts_with('SFN'), ~.x*volume)) %>%
  #Remove the alfalfa groups so we are only focusing on the broccoli
  filter(treatment %in% c('BL', 'BU')) %>%
  #Convert to factors for analysis + graphing
  modify_at(c('subject_id', 'time', 'cohort'), as.factor) 
#Relevel the time variables to make it work
urine_clean$time %<>% fct_relevel(c('0', '3', '6', '24', '48', '72'))

#Convert to tidy data for ease of analysis
urine_tidy <- urine_clean %>%
  pivot_longer(cols = starts_with('SFN'),  names_to = 'metabolite', values_to = 'uMol') 

This is a preliminary analysis of the targeted SFN-metabolite data from the short term broccoli feeding study. The study was run between March 2021 and November 2021. Sample prepartion was completed by Dr. Carmen Wong, Mass Spectrometry analysis was conducted Dr. Jaweoo Choi, and data analysis by Yanni Bouranis.

This analysis is preliminary and still needs further refinement, however, it seeks to answer the following questions:

  1. Do the label and the unlabeled groups differ for total SFN metabolite excretion?

  2. Does total SFN metabolite excretion differ between cohorts?

  3. Does the metabolism of SFN-NAC and SFN-NIT look similar to what we saw in Lauren Atwell’s data?

  4. Which participants were the highest and lowest SFN-NIT and SFN-NAC excreters?

  5. What impact does dose have on the recovery of metabolites?

All plots are interactive and can be zoomed in on and can be hovered over to get more data about specific data points. The legend can be used to isolate points of interest. Single click an item in the legend to hide all members of that class and double click to hide all others but keep that one.

Question 1: Do the label and the unlabeled groups differ for total SFN metabolite excretion?

The first question we want to answer is if the labeled and unlabeled groups differ in the total SFN metabolites (SFN_Tot) excreted at each time point. Let’s start by plotting each time point:

Plots

We will plot our data as both a box and whiskers plot and a dotplot so we can better see inter-individual variation. For the dotplot, the mean for each treatment group is represented as a black diamond. Note: the y-axis scales are not the same to allow us to see differences across metabolites more easily.

Boxplot
DvH_box <- ggplot(urine_clean, aes(x = treatment, y = SFN_Tot, fill = treatment)) + 
  geom_boxplot() +
  facet_wrap(~time, scales = 'free_y') +
  xlab('Treatment') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  scale_x_discrete(labels = c('Labeled', 'Unlabeled')) +
  cowplot::theme_minimal_grid() +
  theme(legend.position = 'none')

plotly::ggplotly(DvH_box)
Dotplot
DvH_ind <- ggplot(urine_clean, aes(x = treatment, y = SFN_Tot, color = subject_id)) + 
  geom_point(aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excretion: ', round(SFN_Tot, 3), ' µMol \n'))) +
  geom_point(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.9, size = 3.6, shape = 18) +
  facet_wrap(~time, scales = 'free_y') +
  cowplot::theme_minimal_grid() +
  xlab('Treatment') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  scale_x_discrete(labels = c('Labeled', 'Unlabeled')) +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(DvH_ind, tooltip = 'text')

From our plots, it looks like there is not much difference across our groups. Let’s verify that by running some stats.

Stats

For ease of analysis, instead of using a linear mixed model we will stratify our data by time point and use a simple Mann-Whitney U-Test (nonparametric t-test) to compare between the Label and the Unlabeled group. The interpretation of these results is that a significant p-value denotes that there is a significant difference between the labeled- and unlabeled-samples for total SFN metabolite excretion at each time point.

labelstats <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  group_by(time) %>%
  nest() %>%
  mutate(pval = map_dbl(data, function(x) wilcox.test(uMol ~ treatment, data = x)$p.val)) %>%
  dplyr::select(-data) %>%
  rename('Time' = time, 'p-Value' = pval)

knitr::kable(labelstats)
Time p-Value
0 0.6547343
3 0.7985884
6 0.0850208
24 0.2836463
48 0.1458033
72 0.2797874

Unsurprisingly, there is no significant differences between our labeled and unlabeled groups at any time points.

Question 2: Does total SFN metabolite excretion differ between cohorts?

Our next goal is to evaluate if there is a significant cohort effect occurring when it comes to total SFN metabolite excretion. The cohorts were run during different times of years which could influence the growth of the sprouts (i.e. light, heat conditions), as well as different biological conditions occurring such as weight, activity level, stress level, and water intake of our participants.

Let’s start by looking at the SFN-dose and grams of sprouts fed to each cohort:

Bar Graph
sproutdata <- read_csv('../Data/Metadata/sprout_meta.csv') %>%
  modify_at('cohort', as.factor) 

spd <- ggplot(sproutdata, aes(x = cohort , y = SFN_fed, fill = treatment)) + 
  geom_col(position = 'dodge2',
           aes(text = paste0('Cohort: ', cohort, '\n',
                             'Grams Sprouts Fed: ', grams_fed, 'g \n',
                             'SFN-Equivalents Fed: ', SFN_fed, ' µmol \n',
                             'Seed Batch: ', seed_batch, '\n',
                             'Tretment: ', ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')))) +
  cowplot::theme_minimal_hgrid() +
  xlab('Cohort') + 
  ylab('SFN-Equivalents Fed (µMol)') +
  ggtitle('SFN-Equivalents Fed by Cohort')



plotly::ggplotly(spd, tooltip = 'text')
Scatter Plot
svg <- ggplot(sproutdata, aes(x = SFN_fed, y = grams_fed, color = cohort)) + 
  geom_point(aes(text = paste0('Cohort: ', cohort, '\n',
                             'Grams Sprouts Fed: ', grams_fed, 'g \n',
                             'SFN-Equivalents Fed: ', SFN_fed, ' µmol \n',
                             'Seed Batch: ', seed_batch, '\n',
                             'Treatment: ', ifelse(treatment == 'BL', 'Labeled', 'Unlabeled'))), size = 4) +
  cowplot::theme_minimal_grid() +
  xlab('SFN-Equivalents Fed (µMol)') +
  ylab('Grams of Sprouts Fed (g)') +
  ggtitle('SFN Equivalents Fed vs Grams Fed')

plotly::ggplotly(svg, tooltip = 'text')
Table
shr <- sproutdata %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  rename('Cohort' = cohort, 'Treatment' = treatment, 'Grams Fed' = grams_fed, 'SFN Fed' = SFN_fed, 'Seed Batch' = seed_batch)


knitr::kable(shr)
Cohort Treatment Grams Fed SFN Fed Seed Batch
1 Unlabeled 71.00 135.8 1
1 Labeled 58.00 160.5 1
2 Unlabeled 64.00 151.6 1
2 Labeled 54.00 161.7 1
3 Unlabeled 40.00 81.3 1
3 Labeled 35.00 89.2 1
4 Unlabeled 30.40 100.0 1
4 Labeled 41.60 100.0 1
5 Unlabeled 47.10 100.0 1
5 Labeled 36.38 100.0 1
6 Unlabeled 29.03 100.0 1
6 Labeled 29.66 100.0 1
7 Unlabeled 28.90 100.0 1
7 Labeled 22.00 100.0 1
8 Unlabeled 26.50 100.0 2
8 Labeled 34.50 100.0 2

Plots

Now we will plot the total SFN-metabolites recovered as both a boxplot and dotplot. Note: the y-axis scales are not the same to allow us to see differences across metabolites more easily.

Boxplots
CoI_box <- ggplot(urine_clean, aes(x = cohort, y = SFN_Tot, fill = cohort)) +
  geom_boxplot() +
  facet_wrap(~time, scales = 'free_y') +
  cowplot::theme_minimal_grid() +
  xlab('Cohort') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  theme(legend.position = 'none')

plotly::ggplotly(CoI_box)
Dotplots
CoI_ind <- ggplot(urine_clean, aes(x = cohort, y = SFN_Tot, color = subject_id, group = cohort)) + 
  geom_point(aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excretion: ', round(SFN_Tot, 3), ' µMol \n',
                               'Treatment: ', ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')))) +
  geom_point(stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.9, size = 3.6, shape = 18) +
  facet_wrap(~time, scales = 'free_y') +
  cowplot::theme_minimal_grid() +
  xlab('Cohort') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  scale_color_discrete(name = 'Participant \n ID')


plotly::ggplotly(CoI_ind, tooltip = 'text')

Overall, it looks like most cohorts are quite similar, however, Cohorts 1 and 2 definitely have higher levels of excretion compared to the rest. For the 0h time point, it looks like participants BSS007 and BSS013 in Cohorts 1 and 2, respectively, made have made a mistake during their dietary restrictions too.

Let’s see how this plays out in our stats:

Stats

For ease of analysis, data will be stratified by time point and then analyzed across cohorts using an ANOVA. The interpretation of these results it that a significant p-value denotes a significant difference between at least 2 cohorts for that time point.

cohortstats <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  group_by(time) %>%
  nest() %>%
  mutate(pval = map_dbl(data, function(x) anova(lm(uMol ~ cohort, data = x))$"Pr(>F)"[1])) %>%
  dplyr::select(-data) %>%
  rename('Time' = time, 'p-Value' = pval)

knitr::kable(cohortstats)
Time p-Value
0 0.3288060
3 0.2021590
6 0.0007957
24 0.0001501
48 0.0000024
72 0.0014641

It looks like at our later time points we get see significant differences. From looking at our plots, I can assume that this is being driven by cohorts 1 and 2 which were fed higher amounts of sulforaphane than the other cohorts. Let’s verify this by looking at the results of the linear model we made to do our ANOVA.

To interpret this output, we look at the Coefficients section of the output. (Intercept) is Cohort 1 and Estimate for this coefficient is the estimated mean of this cohort for that time point. For the other cohorts, Estimate represents the difference between its mean and the mean of Cohort 1. Pr(>|t|) is the p-value resulting from a t-test between each cohort and Cohort 1 (the (Intercept) term) - (ignore the p-value associated with the (Intercept) term). If a coefficient (Cohort) is found to be significant, it can be interpreted that the mean in total SFN metabolites excreted between that Cohort and Cohort 1 are significantly different.

cdata_list <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  group_by(time) %>%
  nest() 

walk2(.x = cdata_list$data, 
      .y = list('0h', '3h', '6h', '24h', '48h', '72h'),
      function(x,y){
        print(y)
        print(summary(lm(uMol ~ cohort, data = x)))
      })
## [1] "0h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37942 -0.00285  0.00000  0.00000  1.13825 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.3794     0.1225   3.098   0.0042 **
## cohort2      -0.3025     0.1732  -1.746   0.0910 . 
## cohort3      -0.3710     0.1732  -2.142   0.0404 * 
## cohort4      -0.3794     0.1581  -2.400   0.0228 * 
## cohort5      -0.3794     0.1871  -2.028   0.0515 . 
## cohort6      -0.3766     0.1581  -2.382   0.0238 * 
## cohort7      -0.3794     0.1500  -2.530   0.0169 * 
## cohort8      -0.3794     0.1871  -2.028   0.0515 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2449 on 30 degrees of freedom
## Multiple R-squared:  0.2199, Adjusted R-squared:  0.03786 
## F-statistic: 1.208 on 7 and 30 DF,  p-value: 0.3288
## 
## [1] "3h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.901  -9.074  -2.494   6.986  29.597 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   48.139      7.200   6.686 2.47e-07 ***
## cohort2       -9.384     10.183  -0.922   0.3644    
## cohort3      -13.991     10.183  -1.374   0.1800    
## cohort4      -18.596      9.660  -1.925   0.0641 .  
## cohort5      -26.478     10.999  -2.407   0.0227 *  
## cohort6      -24.012      9.296  -2.583   0.0151 *  
## cohort7      -21.234      8.819  -2.408   0.0226 *  
## cohort8      -21.374     10.999  -1.943   0.0617 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.4 on 29 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2675, Adjusted R-squared:  0.09069 
## F-statistic: 1.513 on 7 and 29 DF,  p-value: 0.2022
## 
## [1] "6h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.232 -14.307  -3.131  10.101  56.011 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    75.04      10.08   7.443  2.7e-08 ***
## cohort2        12.92      14.26   0.906  0.37215    
## cohort3       -34.07      14.26  -2.390  0.02334 *  
## cohort4       -40.77      13.01  -3.133  0.00385 ** 
## cohort5       -26.52      15.40  -1.722  0.09536 .  
## cohort6       -45.71      13.01  -3.512  0.00143 ** 
## cohort7       -40.93      12.35  -3.315  0.00240 ** 
## cohort8       -30.47      15.40  -1.979  0.05710 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.16 on 30 degrees of freedom
## Multiple R-squared:  0.5374, Adjusted R-squared:  0.4294 
## F-statistic: 4.978 on 7 and 30 DF,  p-value: 0.0007957
## 
## [1] "24h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.500  -7.647  -0.751   9.526  36.503 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   83.641      8.695   9.619 1.12e-10 ***
## cohort2       -6.444     12.297  -0.524  0.60410    
## cohort3      -41.513     12.297  -3.376  0.00205 ** 
## cohort4      -47.142     11.225  -4.200  0.00022 ***
## cohort5      -47.374     13.282  -3.567  0.00124 ** 
## cohort6      -52.518     11.225  -4.679 5.77e-05 ***
## cohort7      -50.858     10.649  -4.776 4.39e-05 ***
## cohort8      -37.519     13.282  -2.825  0.00833 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.39 on 30 degrees of freedom
## Multiple R-squared:  0.5919, Adjusted R-squared:  0.4967 
## F-statistic: 6.217 on 7 and 30 DF,  p-value: 0.0001501
## 
## [1] "48h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1168 -1.8467  0.1342  1.7926  4.9284 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   14.894      1.393  10.689 9.48e-12 ***
## cohort2       -2.503      1.971  -1.270 0.213690    
## cohort3       -8.655      1.971  -4.392 0.000129 ***
## cohort4       -8.848      1.799  -4.919 2.93e-05 ***
## cohort5       -7.369      2.128  -3.462 0.001632 ** 
## cohort6      -11.503      1.799  -6.395 4.65e-07 ***
## cohort7      -10.978      1.707  -6.433 4.18e-07 ***
## cohort8       -9.771      2.128  -4.591 7.38e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.787 on 30 degrees of freedom
## Multiple R-squared:  0.6978, Adjusted R-squared:  0.6273 
## F-statistic: 9.896 on 7 and 30 DF,  p-value: 2.403e-06
## 
## [1] "72h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.08313 -0.43230 -0.04454  0.36891  2.85011 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.1035     0.4780   6.493 3.55e-07 ***
## cohort2      -0.8936     0.6760  -1.322 0.196200    
## cohort3      -2.2756     0.6760  -3.366 0.002101 ** 
## cohort4      -2.3524     0.6171  -3.812 0.000638 ***
## cohort5      -1.9318     0.7302  -2.646 0.012851 *  
## cohort6      -2.7888     0.6171  -4.519 9.02e-05 ***
## cohort7      -2.6567     0.5854  -4.538 8.56e-05 ***
## cohort8      -1.9976     0.7302  -2.736 0.010348 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.956 on 30 degrees of freedom
## Multiple R-squared:  0.5152, Adjusted R-squared:  0.4021 
## F-statistic: 4.554 on 7 and 30 DF,  p-value: 0.001464

As suspected, Cohorts 1 and 2 are higher than all other cohorts at all time points. Let’s remove Cohorts 1 and 2 from our data and then try again to see if it still comes out as significantly different.

cohortstats_2 <- urine_tidy %>%
  #Filter down to only total sulforaphane content
  filter(metabolite == 'SFN_Tot') %>%
  #Remove cohorts 1 and 2
  filter(!cohort %in% c(1,2)) %>%
  group_by(time) %>%
  nest() %>%
  mutate(pval = map_dbl(data, function(x) anova(lm(uMol ~ cohort, data = x))$"Pr(>F)"[1])) %>%
  dplyr::select(-data) %>%
  rename('Time' = time, 'p-Value' = pval)

knitr::kable(cohortstats_2)
Time p-Value
0 0.3210257
3 0.8174697
6 0.6086423
24 0.7824433
48 0.2127658
72 0.2650680

As expected, Cohorts 1 and 2 were driving the significance that we observed earlier.

How do we want to handle these cohorts for future analyses?

Question 3: What does metabolism look like?

SFN-NAC

Next we want to see the metabolism of SFN-NAC and SFN-NIT in our participants. To control for variable urine production between individuals, µM (as reported by Carmen) has been converted to µMol recovered in urine using urine weight as a proxy for volume. We will start by looking at the metabolism of SFN-NAC by plotting it by both subject, to see inter-individual variation, and by Cohort to see means across cohorts.

Hovering over any data point will give you more information about that point and the plot can be zoomed in on. The grey bars represent the mean at each timepoint.

By Participant
NACall_sub <- ggplot(urine_clean, aes(x = time, y = SFN_NAC, group = subject_id, color = subject_id)) +
  geom_bar(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.5) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Total Excretion: ', round(SFN_NAC, 3), ' µMol \n',
                               'Treatment: ', ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')))) +
  geom_path(size = 0.5) + 
  theme_minimal_hgrid() +
  ylab('SFN-NAC (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('SFN-NAC Excretion (µMol) by Participant') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(NACall_sub, tooltip = 'text')
By Cohort
ser <- function(x){
  sd(x)/sqrt(length(x))
}  

summary_NAC <- urine_tidy %>%
  filter(metabolite == 'SFN_NAC') %>%
  group_by(time, cohort) %>%
  drop_na() %>%
  summarise(mean = round(mean(uMol, na.rm = TRUE),3),
            SE = round(ser(uMol), 3),
            n = n())

NACsum <- ggplot(summary_NAC, aes(x = time, y = mean, fill = cohort)) +
  geom_errorbar(aes(ymin = mean - SE, ymax = mean+SE,
                    text = paste('Standard Error: ±', SE, ' µMol')), width = 0.2) +
  geom_col(aes(text = paste0('Cohort: ', cohort, '\n',
                             'Time: ', time, '\n',
                             'Mean SFN-NAC: ', round(mean, 3), ' µmol \n',
                             'n: ', n))) +
  theme_minimal_hgrid() +
  ylab('SFN-NAC (µmol)') +
  xlab('Time (Hours)') +
  facet_wrap(~cohort, ncol = 4) +
  ggtitle('SFN-NAC Excretion by Cohort') +
  theme(legend.position = 'none')

plotly::ggplotly(NACsum, tooltip = 'text')

SFN-NIT

Next we will look at SFN-NIT metabolism. To recap, in Lauren Atwell’s data we saw an increase in metabolism from 0h to 24 hours then for some participants there was a decline between 24 and 48 hours while other saw an increase.

By Pariticipant:
NITall_sub <- ggplot(urine_clean, aes(x = time, y = SFN_NIT, group = subject_id, color = subject_id)) +
  geom_bar(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.5) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Time Point: ', time, ' hours \n',
                               'SFN-NIT Excretion: ', round(SFN_NIT, 3), ' µMol \n',
                               'Treatment: ', ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')))) +
  geom_path(size = 0.5) + 
  theme_minimal_hgrid() +
  ylab('SFN-NIT (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('SFN-NIT Excretion (µMol) by Participant') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(NITall_sub, tooltip = 'text')
By Cohort
summary_NIT <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT') %>%
  group_by(time, cohort) %>%
  drop_na() %>%
  summarise(mean = round(mean(uMol, na.rm = TRUE),3),
            SE = round(ser(uMol), 3),
            n = n())

NITsum <- ggplot(summary_NIT, aes(x = time, y = mean, fill = cohort)) +
  geom_errorbar(aes(ymin = mean - SE, ymax = mean+SE,
                    text = paste('Standard Error: ±', SE, ' µMol')), width = 0.2) +
  geom_col(aes(text = paste0('Cohort: ', cohort, '\n',
                             'Time: ', time, '\n',
                             'Mean SFN-NIT: ', round(mean, 3), ' µmol \n',
                             'n: ', n))) +
  theme_minimal_hgrid() +
  ylab('SFN-NIT (µmol)') +
  xlab('Time (Hours)') +
  facet_wrap(~cohort, ncol = 4) +
  theme(legend.position = 'none')

plotly::ggplotly(NITsum, tooltip = 'text')

Percent Dose Recovered:

Lastly we will look at the percent of the dose fed to help control for the unequal doses fed to Cohorts 1-3. Percent recovery is calculated by dividing the total SFN-metabolites recovered at each time point by the dose fed to that cohort.

By Pariticipant:
pct_data <- urine_clean %>%
  dplyr::select(subject_id, treatment, cohort, time, SFN_Tot) %>%
  left_join(sproutdata) %>%
  mutate(pct_rec = SFN_Tot/SFN_fed) 
  
pctI <- ggplot(pct_data, aes(x = time, y = pct_rec, group = subject_id, color = subject_id)) +
  geom_bar(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.5) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Total Metabolites Recoved: ', round(SFN_Tot, 3), ' µMol \n',
                               '% Dose Recovered: ', round(pct_rec*100, 2), '% \n',
                               'SFN Dose Fed: ', round(SFN_fed, 3), ' µmol \n',
                               'Treatment: ', ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')))) +
  geom_path(size = 0.5) + 
  theme_minimal_hgrid() +
  ylab('% Dose Recovered') +
  xlab('Time (Hours)') +
  ggtitle('% Dose Recovered by Participant') +
  scale_color_discrete(name = 'Participant \n ID') +
  scale_y_continuous(labels = scales::label_percent())

plotly::ggplotly(pctI, tooltip = 'text')
By Cohort
pct_sum <- pct_data %>%
  group_by(time, cohort) %>%
  drop_na() %>%
  summarise(mean = round(mean(pct_rec, na.rm = TRUE),3),
            SE = round(ser(pct_rec), 3),
            n = n(),
            sf = mean(SFN_fed))

PCTsum <- ggplot(pct_sum, aes(x = time, y = mean, fill = cohort)) +
  geom_errorbar(aes(ymin = mean - SE, ymax = mean+SE,
                    text = paste('Standard Error: ±', round(SE*100, 2), ' %')), width = 0.2) +
  geom_col(aes(text = paste0('Cohort: ', cohort, '\n',
                             'Time: ', time, '\n',
                             'Mean % Recovered: ', round(mean*100, 2), '% \n',
                             'SFN Dose Fed: ', sf, ' µmol \n', 
                             'n: ', n))) +
  theme_minimal_hgrid() +
  ylab('% Dose Recovered') +
  xlab('Time (Hours)') +
  ggtitle('% Dose Recovered by Cohort') +
  facet_wrap(~cohort, ncol = 4) +
  scale_y_continuous(labels = scales::label_percent()) +
  theme(legend.position = 'none')

plotly::ggplotly(PCTsum, tooltip = 'text')

Interestingly, it looks like pattern in SFN-NIT that we saw in Lauren Atwell’s data isn’t observed here. One explanation could be the use of urine weight as a proxy for volume as opposed to true volume, as we used in Lauren Atwell’s data. The pattern in SFN-NAC excretion is also different from what we observed in Lauren’ Atwell’s data. The excretion of SFN-NIT between at 24 hours is fairly similar to what we saw in Lauren Atwell’s study, however. In her study, participants consumed 200 µmol of SFN-equivalents and ~23 µmol of SFN-NIT was recovered in urine at 24 hours, while in our study our particpants consumed 100 µmol of SFN-equivalents at ~12 µmol of SFN-NIT was recovered at 24 hours. In the Atwell study, there was also a 12 hour time point while we only looked at 24 hours which could also explain some of the differences we observed.

Lastly, let’s see at what time points SFN-NIT and SFN-NAC excretion was significantly different compared to 0hr. To evaluate this, we will use a nested mixed effects model. We will be using the model formula: uMol ~ time + (1|cohort/subject_id)

Interpretation of the results: Compared to 0h, does this time point significantly differ compare in µmol recovered?

library(lme4)
test <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT')

#Fit our model:
lm_time <- urine_tidy %>%
  filter(metabolite != 'SFN_Tot') %>%
  group_by(metabolite) %>%
  nest() %>%
  #Run the model:
  mutate(model = map(data, function(x) lmer(uMol ~ time + (1|cohort/subject_id), data = x))) %>%
  #Run the contrasts across each time point
  mutate(pval_3h = map_dbl(model, function(x) lmerTest::contest(x, c(0,1,0,0,0,0))$'Pr(>F)')) %>%
  mutate(pval_6h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,1,0,0,0))$'Pr(>F)')) %>%
  mutate(pval_24h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,0,1,0,0))$'Pr(>F)')) %>%
  mutate(pval_48h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,0,0,1,0))$'Pr(>F)')) %>%
  mutate(pval_72h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,0,0,0,1))$'Pr(>F)')) %>%
  dplyr::select(-c(data, model)) %>%
  rename('Metabolite' = metabolite, '3h p-Value' = pval_3h, '6h p-Value' = pval_6h, '24h p-Value' = pval_24h,
         '48h p-Value' = pval_48h, '72h p-Value' = pval_72h)

#Round the data to make it more readable
lmt_rounded <- lm_time %>% 
  modify_if(is.numeric, round, 4)

knitr::kable(lmt_rounded)
Metabolite 3h p-Value 6h p-Value 24h p-Value 48h p-Value 72h p-Value
SFN 0.0000 0.0000 0.0000 0.9943 0.9943
SFN_CYS 0.0000 0.0000 0.0000 0.8433 0.9819
SFN_NAC 0.0000 0.0000 0.0000 0.2817 0.8287
SFN_CG 0.6383 0.2571 0.0476 0.9956 0.9956
SFN_GSH 0.2243 0.0015 1.0000 1.0000 1.0000
SFN_NIT 0.0010 0.0000 0.0000 0.0000 0.4688

Question 4: Which participants were the highest and lowest SFN-NIT and SFN-NAC excreters?

To address this problem we can take three approaches:

  1. Find participants who excreted the highest cumulative amount SFN-NIT and SFN-NAC across all times points

  2. Find which participants excreted the maximum amount of SFN-NIT and SFN-NAC at a single time point.

  3. Find which participants had the greatest percent recovery using the two methods above

Cumulative Excretion

We will use our first strategy to look at cumulative metabolite production of NAC and NIT across all time points.

SFN-NAC
cumsum_NAC <- urine_tidy %>%
  filter(metabolite == 'SFN_NAC') %>%
  drop_na() %>%
  group_by(subject_id, cohort, treatment) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  ungroup()

cumsum_NAC_L <- cumsum_NAC %>% filter(treatment == 'BL')

fillvec_L <- rep('', length(cumsum_NAC_L$data))
fillvec_L[which(cumsum_NAC_L$data %in% head(cumsum_NAC_L$data[order(cumsum_NAC_L$data, decreasing = TRUE)], 8))] <- 'Highest Labeled'
fillvec_L[which(cumsum_NAC_L$data %in% tail(cumsum_NAC_L$data[order(cumsum_NAC_L$data, decreasing = TRUE)], 8))] <- 'Lowest Labeled'
cumsum_NAC_L$maxmin <- fillvec_L

cumsum_NAC_U <- cumsum_NAC %>% filter(treatment == 'BU')

fillvec_U <- rep('', length(cumsum_NAC_U$data))
fillvec_U[which(cumsum_NAC_U$data %in% head(cumsum_NAC_U$data[order(cumsum_NAC_U$data, decreasing = TRUE)], 8))] <- 'Highest Unlabeled'
fillvec_U[which(cumsum_NAC_U$data %in% tail(cumsum_NAC_U$data[order(cumsum_NAC_U$data, decreasing = TRUE)], 8))] <- 'Lowest Unlabeled'
cumsum_NAC_U$maxmin <- fillvec_U

csNAC_full <- rbind(cumsum_NAC_L, cumsum_NAC_U)

csoutNAC <- csNAC_full %>% 
  dplyr::select(subject_id, cohort, treatment, data, maxmin) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  mutate(data = round(data, 3)) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Cumulative µMol Excreted' = data, 'Label' = treatment, 'Highest or Lowest Producer' = maxmin)

DT::datatable(csoutNAC,
           filter = 'top',
           extensions = 'Buttons',
           options = list(pageLength = 12,
                          dom = 'Bfrtip',
                          scrollX = TRUE))  
SFN-NIT
cumsum_NIT <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT') %>%
  drop_na() %>%
  group_by(subject_id, cohort, treatment) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  ungroup()

cumsum_NIT_L <- cumsum_NIT %>% filter(treatment == 'BL')

fillvec_L <- rep('', length(cumsum_NIT_L$data))
fillvec_L[which(cumsum_NIT_L$data %in% head(cumsum_NIT_L$data[order(cumsum_NIT_L$data, decreasing = TRUE)], 8))] <- 'Highest Labeled'
fillvec_L[which(cumsum_NIT_L$data %in% tail(cumsum_NIT_L$data[order(cumsum_NIT_L$data, decreasing = TRUE)], 8))] <- 'Lowest Labeled'
cumsum_NIT_L$maxmin <- fillvec_L

cumsum_NIT_U <- cumsum_NIT %>% filter(treatment == 'BU')

fillvec_U <- rep('', length(cumsum_NIT_U$data))
fillvec_U[which(cumsum_NIT_U$data %in% head(cumsum_NIT_U$data[order(cumsum_NIT_U$data, decreasing = TRUE)], 8))] <- 'Highest Unlabeled'
fillvec_U[which(cumsum_NIT_U$data %in% tail(cumsum_NIT_U$data[order(cumsum_NIT_U$data, decreasing = TRUE)], 8))] <- 'Lowest Unlabeled'
cumsum_NIT_U$maxmin <- fillvec_U

csNIT_full <- rbind(cumsum_NIT_L, cumsum_NIT_U)

csoutNIT <- csNIT_full %>% 
  dplyr::select(subject_id, cohort, treatment, data, maxmin) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  mutate(data = round(data, 3)) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Cumulative µMol Excreted' = data, 'Label' = treatment, 'Highest or Lowest Producer' = maxmin)

DT::datatable(csoutNIT,
           filter = 'top',
           extensions = 'Buttons',
           options = list(pageLength = 12,
                          dom = 'Bfrtip',
                          scrollX = TRUE))  
Total Metabolites % Recovered
cumsum_pct <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  drop_na() %>%
  group_by(subject_id, cohort, treatment) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  ungroup() %>%
  left_join(sproutdata) %>%
  mutate(total_rec = (data/SFN_fed)*100)

cumsum_pct_L <- cumsum_pct %>% filter(treatment == 'BL')

fillvec_L <- rep('', length(cumsum_pct_L$data))
fillvec_L[which(cumsum_pct_L$data %in% head(cumsum_pct_L$data[order(cumsum_pct_L$data, decreasing = TRUE)], 8))] <- 'Highest Labeled'
fillvec_L[which(cumsum_pct_L$data %in% tail(cumsum_pct_L$data[order(cumsum_pct_L$data, decreasing = TRUE)], 8))] <- 'Lowest Labeled'
cumsum_pct_L$maxmin <- fillvec_L

cumsum_pct_U <- cumsum_pct %>% filter(treatment == 'BU')

fillvec_U <- rep('', length(cumsum_pct_U$data))
fillvec_U[which(cumsum_pct_U$data %in% head(cumsum_pct_U$data[order(cumsum_pct_U$data, decreasing = TRUE)], 8))] <- 'Highest Unlabeled'
fillvec_U[which(cumsum_pct_U$data %in% tail(cumsum_pct_U$data[order(cumsum_pct_U$data, decreasing = TRUE)], 8))] <- 'Lowest Unlabeled'
cumsum_pct_U$maxmin <- fillvec_U

csPCT_full <- rbind(cumsum_pct_L, cumsum_pct_U)

csoutpct <- csPCT_full %>% 
  dplyr::select(subject_id, cohort, treatment, data, total_rec, maxmin) %>%
  filter(maxmin != '') %>%
  mutate(total_rec = round(total_rec, 2)) %>%
  mutate(data = round(data, 3)) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Cumulative % Recovered' = total_rec, 'Total SFN Metabolites Excreted' = data, 'Highest or Lowest Producer' = maxmin)

DT::datatable(csoutpct,
           filter = 'top',
           extensions = 'Buttons',
           options = list(pageLength = 12,
                          dom = 'Bfrtip',
                          scrollX = TRUE))  

For SFN-NAC, our top excreters are BSS017, BSS009, BSS003, BSS035, and BSS007.

For SFN-NIT, our top excreters are BSS005, BSS009, BSS006, BSS015, and BSS007.

Looking at the cumulative recovery of dose, the top excreters are BSS0035, BSS029, BSS005, BSS009, and BSS017.

There is quite a bit of overlap between our top NAC and NIT excreters and most of top excreters unsurprisingly come from Cohorts 1 and 2. BSS035, a top NAC excreter, is an interesting exception to this pattern.

When looking at cumulative recovery, we are seeing well over 100% of the fed dose being recovered. This is rather interesting but could be due to the conversion of erucin to SFN.

Let’s plot our highest and lowest excreters from the cumulative method:

High/Low NAC - Recovery
cs_final_NAC <- csNAC_full %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT')) 
  
  
cs_plot_r <- ggplot(cs_final_NAC, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(uMol, 3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite + treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(cs_plot_r, tooltip = 'text')
High/Low NAC - Cumulative Excretion
cssum <- cs_final_NAC %>%
  group_by(subject_id, metabolite, maxmin) %>%
  mutate(cumsum = cumsum(uMol)) 


csplot_t <- ggplot(cssum, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative Excretion: ', round(cumsum, 3), ' µMol \n'))) +
  geom_path(aes(group = subject_id), size =0.5) +
  facet_wrap(~metabolite + treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(csplot_t, tooltip = 'text')
High/Low NIT - Recovery
cs_final_NIT <- csNIT_full %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT')) 
  
cs_plot_r_nit <- ggplot(cs_final_NIT, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(uMol, 3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite + treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(cs_plot_r_nit, tooltip = 'text')
High/Low NIT - Cumulative Excretion
cssum_NIT <- cs_final_NIT %>%
  group_by(subject_id, metabolite, maxmin) %>%
  mutate(cumsum = cumsum(uMol)) 


csplot_t_nit <- ggplot(cssum_NIT, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative Excretion: ', round(cumsum, 3), ' µMol \n'))) +
  geom_path(aes(group = subject_id), size = 0.5) +
  facet_wrap(~metabolite + treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(csplot_t_nit, tooltip = 'text')
High/Low Total Metabolites - Recovery
cs_final_pct <- csPCT_full %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(pct_data) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) 
  
cs_plot_r_pct <- ggplot(cs_final_pct, aes(x = time, y = pct_rec, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(SFN_Tot, 3), ' µMol \n',
                               'Dose Fed: ', SFN_fed, ' µMol \n',
                               '% Recovered: ', round(pct_rec*100, 2), '%'))) +
  geom_path(size = 0.5) +
  facet_wrap(~treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID') +
  scale_y_continuous(labels = scales::percent_format())

plotly::ggplotly(cs_plot_r_pct, tooltip = 'text')
High/Low Total Metabolites - Cumulative
cssum_pct <- cs_final_pct %>%
  group_by(subject_id, maxmin) %>%
  mutate(cumsum_pct = cumsum(pct_rec),
         cumsum_sfn = cumsum(SFN_Tot)) 


csplot_t_pct <- ggplot(cssum_pct, aes(x = time, y = cumsum_pct, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative Excretion: ', round(cumsum_sfn, 3), ' µMol \n',
                               'Dose Fed: ', SFN_fed, ' µMol \n',
                               'Cumulative %: ', round(cumsum_pct*100, 2), '%'))) +
  geom_path(aes(group = subject_id), size = 0.5) +
  facet_wrap(~treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID') +
  scale_y_continuous(labels = scales::percent_format())

plotly::ggplotly(csplot_t_pct, tooltip = 'text')

Maximum Recovery

Next we will search across each individual and find their maximum NAC and NIT excretion then search across all individuals to find the highest excretion.

SFN-NAC
maxminNAC <- urine_tidy %>%
  filter(metabolite == 'SFN_NAC') %>%
  drop_na() %>%
  #Group by subject ID and break down into individual dataframes by participant
  group_by(subject_id, treatment) %>%
  nest() %>%
  #Pull out the time point with max recovery for each metabolite and each individual
  mutate(data = map(data, function(x) x[which(x$uMol == max(x$uMol)),])) %>%
  #Put the dataframe back together
  unnest(cols = data) %>%
  arrange(uMol) %>%
  ungroup()

maxminNAC_L <- maxminNAC %>% filter(treatment == 'BL')

fillvec_L <- rep('', length(maxminNAC_L$uMol))
fillvec_L[which(maxminNAC_L$uMol %in% head(maxminNAC_L$uMol[order(maxminNAC_L$uMol, decreasing = TRUE)], 8))] <- 'Highest Labeled'
fillvec_L[which(maxminNAC_L$uMol %in% tail(maxminNAC_L$uMol[order(maxminNAC_L$uMol, decreasing = TRUE)], 8))] <- 'Lowest Labeled'
maxminNAC_L$maxmin <- fillvec_L

maxminNAC_U <- maxminNAC %>% filter(treatment == 'BU')

fillvec_U <- rep('', length(maxminNAC_U$uMol))
fillvec_U[which(maxminNAC_U$uMol %in% head(maxminNAC_U$uMol[order(maxminNAC_U$uMol, decreasing = TRUE)], 8))] <- 'Highest Unlabeled'
fillvec_U[which(maxminNAC_U$uMol %in% tail(maxminNAC_U$uMol[order(maxminNAC_U$uMol, decreasing = TRUE)], 8))] <- 'Lowest Unlabeled'
maxminNAC_U$maxmin <- fillvec_U

mmNAC_full <- rbind(maxminNAC_L, maxminNAC_U)

outtableNAC <- mmNAC_full %>% 
  dplyr::select(subject_id, cohort, time, treatment, uMol, maxmin) %>%
  mutate(uMol = round(uMol, 3)) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Time Recovered' = time, 'Recovered µMol' = uMol, 'Highest or Lowest Producer' = maxmin, 'Label' = treatment)

DT::datatable(outtableNAC,
           filter = 'top',
           extensions = 'Buttons',
           options = list(pageLength = 12,
                          dom = 'Bfrtip',
                          scrollX = TRUE))  
SFN-NIT
maxminNIT <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT') %>%
  drop_na() %>%
  #Group by subject ID and break down into individual dataframes by participant
  group_by(subject_id) %>%
  nest() %>%
  #Pull out the time point with max recovery for each metabolite and each individual
  mutate(data = map(data, function(x) x[which(x$uMol == max(x$uMol)),])) %>%
  #Put the dataframe back together
  unnest(cols = data) %>%
  ungroup()

maxminNIT_L <- maxminNIT %>% filter(treatment == 'BL')

fillvec_L <- rep('', length(maxminNIT_L$uMol))
fillvec_L[which(maxminNIT_L$uMol %in% head(maxminNIT_L$uMol[order(maxminNIT_L$uMol, decreasing = TRUE)], 8))] <- 'Highest Labeled'
fillvec_L[which(maxminNIT_L$uMol %in% tail(maxminNIT_L$uMol[order(maxminNIT_L$uMol, decreasing = TRUE)], 8))] <- 'Lowest Labeled'
maxminNIT_L$maxmin <- fillvec_L

maxminNIT_U <- maxminNIT %>% filter(treatment == 'BU')

fillvec_U <- rep('', length(maxminNIT_U$uMol))
fillvec_U[which(maxminNIT_U$uMol %in% head(maxminNIT_U$uMol[order(maxminNIT_U$uMol, decreasing = TRUE)], 8))] <- 'Highest Unlabeled'
fillvec_U[which(maxminNIT_U$uMol %in% tail(maxminNIT_U$uMol[order(maxminNIT_U$uMol, decreasing = TRUE)], 8))] <- 'Lowest Unlabeled'
maxminNIT_U$maxmin <- fillvec_U

mmNIT_full <- rbind(maxminNIT_L, maxminNIT_U)

outtableNIT <- mmNIT_full %>% 
  dplyr::select(subject_id, cohort, time, treatment, uMol, maxmin) %>%
  mutate(uMol = round(uMol, 3)) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Time Recovered' = time, 'Recovered µMol' = uMol, 'Highest or Lowest Producer' = maxmin, 'Label' = treatment)

DT::datatable(outtableNIT,
           filter = 'top',
           extensions = 'Buttons',
           options = list(pageLength = 12,
                          dom = 'Bfrtip',
                          scrollX = TRUE))  
Total SFN Metabolites
maxminPCT <-  pct_data%>%
  drop_na() %>%
  #Group by subject ID and break down into individual dataframes by participant
  group_by(subject_id) %>%
  nest() %>%
  #Pull out the time point with max recovery for each metabolite and each individual
  mutate(data = map(data, function(x) x[which(x$pct_rec == max(x$pct_rec)),])) %>%
  #Put the dataframe back together
  unnest(cols = data) %>%
  ungroup()

maxminPCT_L <- maxminPCT %>% filter(treatment == 'BL')

fillvec_L <- rep('', length(maxminPCT_L$pct_rec))
fillvec_L[which(maxminPCT_L$pct_rec %in% head(maxminPCT_L$pct_rec[order(maxminPCT_L$pct_rec, decreasing = TRUE)], 8))] <- 'Highest Labeled'
fillvec_L[which(maxminPCT_L$pct_rec %in% tail(maxminPCT_L$pct_rec[order(maxminPCT_L$pct_rec, decreasing = TRUE)], 8))] <- 'Lowest Labeled'
maxminPCT_L$maxmin <- fillvec_L

maxminPCT_U <- maxminPCT %>% filter(treatment == 'BU')

fillvec_U <- rep('', length(maxminPCT_U$pct_rec))
fillvec_U[which(maxminPCT_U$pct_rec %in% head(maxminPCT_U$pct_rec[order(maxminPCT_U$pct_rec, decreasing = TRUE)], 8))] <- 'Highest Unlabeled'
fillvec_U[which(maxminPCT_U$pct_rec %in% tail(maxminPCT_U$pct_rec[order(maxminPCT_U$pct_rec, decreasing = TRUE)], 8))] <- 'Lowest Unlabeled'
maxminPCT_U$maxmin <- fillvec_U

mmPCT_full <- rbind(maxminPCT_L, maxminPCT_U)

outtablePCT <- mmPCT_full %>% 
  dplyr::select(subject_id, cohort, time, treatment,SFN_Tot, pct_rec, maxmin) %>%
  mutate(SFN_Tot = round(SFN_Tot, 3)) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  mutate(pct_rec = round(pct_rec*100, 2)) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Time Recovered' = time, 'Recovered %' = pct_rec, 'Highest or Lowest Producer' = maxmin, 'Total SFN Metabolites Recovered' = SFN_Tot, 'Label' = treatment)
  
DT::datatable(outtablePCT,
           filter = 'top',
           extensions = 'Buttons',
           options = list(pageLength = 12,
                          dom = 'Bfrtip',
                          scrollX = TRUE))  

Once again, we see quite a bit of overlap between our high NIT and NAC producers.

The highest NAC producers are BSS009, BSS017, BSS003, BSS007, and BSS006.

Similarly the highest NIT producers are BSS005, BSS006, BSS009, BSS015, and BSS017.

The highest percent recovery at a single timepoint was in BSS009, BSS029, BSS005, BSS006, and BSS035.

We can now plot our results to see how they look:

High/Low NAC - Recovery
mx_final <- mmNAC_full %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT'))
  
  
mx_plot_r <- ggplot(mx_final, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(uMol,3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite + treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(mx_plot_r, tooltip = 'text')
High/Low NAC - Cumulative
mxsum <- mx_final %>%
  group_by(subject_id, metabolite, maxmin, treatment) %>%
  mutate(cumsum = cumsum(uMol)) 


mxplot_t <- ggplot(mxsum, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(cumsum,3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite + treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(mxplot_t, tooltip = 'text')
High/Low NIT - Recovery
mxfinal_NIT <- mmNIT_full %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT')) 
  
  
mx_plot_r_nit <- ggplot(mxfinal_NIT, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(uMol,3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite+treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')


plotly::ggplotly(mx_plot_r_nit, tooltip = 'text')
High/Low NIT - Cumulative Excretion
mxsum_NIT <- mxfinal_NIT %>%
  group_by(subject_id, metabolite, maxmin) %>%
  mutate(cumsum = cumsum(uMol)) 

mxplot_t_nit <- ggplot(mxsum_NIT, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative Excretion: ', round(cumsum, 3), ' µMol \n'))) +
  geom_path(aes(group = subject_id), size = 0.5) +
  facet_wrap(~metabolite + treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(mxplot_t_nit, tooltip = 'text')
High/Low Total Metabolites - Recovery
mxfinal_PCT <- mmPCT_full %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(pct_data) %>%
  mutate(treatment = ifelse(treatment == 'BL', 'Labeled', 'Unlabeled')) 

mx_plot_r_pct <- ggplot(mxfinal_PCT, aes(x = time, y = pct_rec, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(SFN_Tot,3), ' µMol \n',
                               'Dose Fed: ', SFN_fed, ' µMol \n',
                               '% Dose: ', round(pct_rec*100, 2), '%'))) +
  geom_path(size = 0.5) +
  facet_wrap(~treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID') +
  scale_y_continuous(labels = scales::percent_format())


plotly::ggplotly(mx_plot_r_pct, tooltip = 'text')
High/Low Total Metabolites - Cumulative
mxsum_PCT <- mxfinal_PCT %>%
  group_by(subject_id, maxmin) %>%
  mutate(cumsum_pct = cumsum(pct_rec),
         cumsum_sfn = cumsum(SFN_Tot)) 

mxplot_t_pct <- ggplot(mxsum_PCT, aes(x = time, y = cumsum_pct, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative SFN Excretion: ', round(cumsum_sfn, 3), ' µMol \n',
                               'Dose Fed: ', SFN_fed, ' µMol \n',
                               'Cumulative Dose Excretion: ', round(cumsum_pct*100, 2), '%'))) +
  geom_path(aes(group = subject_id), size = 0.5) +
  facet_wrap(~treatment, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Dose Recovered') +
  xlab('Time (Hours)') +
  scale_color_discrete(name = 'Participant \n ID') +
  scale_y_continuous(labels = scales::percent_format())
  
plotly::ggplotly(mxplot_t_pct, tooltip = 'text')

Which individuals do want to investigate further for our metabolomics experiment? Which method do we want to use to select our participants? Cumulative excretion or maximum recovery? Do we want to focus on the NIT or NAC list?

Question 5: What is the relationship between SFN consumed and excreted?

Now let’s see if there is a correlation between total SFN excretion and the amount eaten. We will check for each metabolite:

sproutcor <- urine_tidy %>%
  filter(metabolite %in% c('SFN_Tot', 'SFN_NIT', 'SFN_NAC')) %>%
  drop_na() %>%
  group_by(subject_id, cohort, metabolite, treatment) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  rename('uMol' = data) %>%
  ungroup() %>%
  left_join(sproutdata) %>%
  pivot_longer(cols = c('SFN_fed', 'grams_fed'), names_to = 'measure') %>%
  group_by(measure, metabolite) %>%
  nest() %>%
  mutate(rval = map_dbl(data, function(x) cor(x$value, x$uMol))) %>%
  unnest(col = 'data') %>%
  pivot_wider(names_from = 'measure', values_from = 'value') %>%
  pivot_wider(names_from = 'metabolite', values_from = 'uMol')

SFN-NAC
p1_NAC <- ggplot(sproutcor, aes(x = SFN_fed, y = SFN_NAC, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NAC, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN-NAC Recovered (µMol)') +
  theme(legend.position = 'none')
p1_NAC <- plotly::ggplotly(p1_NAC, tooltip = 'text')

p2_NAC <- ggplot(sproutcor, aes(x = grams_fed, y = SFN_NAC, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NAC, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_NAC <- plotly::ggplotly(p2_NAC, tooltip = 'text')


plotly::subplot(p1_NAC, p2_NAC, titleY = TRUE, titleX = TRUE, shareY = T) 
SFN-NIT
p1_NIT <- ggplot(sproutcor, aes(x = SFN_fed, y = SFN_NIT, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NIT, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN-NIT Recovered (µMol)') +
  theme(legend.position = 'none')
p1_NIT <- plotly::ggplotly(p1_NIT, tooltip = 'text')

p2_NIT <- ggplot(sproutcor, aes(x = grams_fed, y = SFN_NIT, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NIT, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_NIT <- plotly::ggplotly(p2_NIT, tooltip = 'text')


plotly::subplot(p1_NIT, p2_NIT, titleY = TRUE, titleX = TRUE, shareY = T) 
Total SFN Metabolites
p1_Tot <- ggplot(sproutcor, aes(x = SFN_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN Metabolites Recovered (µMol)') +
  theme(legend.position = 'none')
p1_Tot <- plotly::ggplotly(p1_Tot, tooltip = 'text')

p2_Tot <- ggplot(sproutcor, aes(x = grams_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_Tot <- plotly::ggplotly(p2_Tot, tooltip = 'text')


plotly::subplot(p1_Tot, p2_Tot, titleY = TRUE, titleX = TRUE, shareY = T) 

There is a pretty clear relationship between the grams of sprouts fed and SFN metabolites recovered. Cohorts 4-8 are definitely interesting and show really high variation which is interesting. Let’s focus in on only those participants briefly and look at their total SFN metabolites:

cfocus <- sproutcor %>%
  filter(cohort %in% 4:8)

p1_f <- ggplot(cfocus, aes(x = SFN_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN Metabolites Recovered (µMol)') +
  theme(legend.position = 'none') +
  scale_x_continuous(breaks = c(100), limits = c(100,100))
p1_f <- plotly::ggplotly(p1_f, tooltip = 'text')

p2_f <- ggplot(cfocus, aes(x = grams_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_f <- plotly::ggplotly(p2_f, tooltip = 'text')


plotly::subplot(p1_f, p2_f, titleY = TRUE, titleX = TRUE, shareY = T) 

It does look like there is a relationship between the amount of sprouts consumed and the recovery of SFN-metabolites. Hopefully by using the hierarchical model it will help to control for this - definitely need to have more conversations with Duo about this issue. This is an interesting finding tho.